## Some of this is cribbed from https://www.dcc-servers.net/robots.txt


### Geographically Meh ### {{{
## no need for Chinese or Russian searches
User-agent: Baiduspider
Disallow: /

## Czech Republic
User-agent: SeznamBot
Disallow: /

## No need for Russian searches and they fetch but ignore robots.txt
User-Agent: Yandex
Disallow: /
### End Geographically Meh ### }}}


### SEO Dung Bots ### {{{
## "The World's Experts in Search Analytics"
##   is yet another SEO outfit that hammers HTTP servers without permission
##   and without benefit for at least some HTTP server operators.
User-Agent: Searchmetrics
Disallow: /

## Claimed SEO; ignores robots.txt
User-Agent: lipperhey
Disallow: /

## Claimed SEO
User-Agent: dataprovider.com
Disallow: /

## SEO
## http://www.semrush.com/bot.html suggests its results are for users:
##   "Well, the real question is why do you not want the bot visiting
##   your page? Most bots are both harmless and quite beneficial. Bots
##   like Googlebot discover sites by following links from page to page.
##   This bot is crawling your page to help parse the content, so that
##   the relevant information contained within your site is easily indexed
##   and made more readily available to users searching for the content
##   you provide."
User-Agent: SemrushBot
Disallow: /

## SEO bs
User-agent: spbot
Disallow: /

## SEO bs
## Wasn't respecting 'dotbot' block...
User-agent: DotBot
Disallow: /

## SEO bs
User-agent: Baiduspider
Disallow: /
### End SEO Dung Bots ### }}}


### Poorly Implemented Crap Bots ### {{{ 
## Stupid bot
User-Agent: purebot
Disallow: /

## Seems to only search for non-existent pages.
##   See ezooms.bot@gmail.com and wowrack.com
User-Agent: Ezooms
Disallow: /

## http://www.majestic12.co.uk/bot.php?+ follows many bogus and corrupt links
##   and so generates a lot of error log noise.
##   It does us no good and is a waste of our bandwidth.
User-Agent: MJ12bot
Disallow: /

## There is no need to waste bandwith on an outfit trying to monetize our
##   web pages.  $50 for data scraped from the web is too much
##   never bothers fetching robots.txt
##   See http://www.domaintools.com
User-Agent: SurveyBot
Disallow: /
User-Agent: DomainTools
Disallow: /

## Too many mangled links and implausible home page
User-Agent: sitebot
Disallow: /

## At best another broken spider that thinks all URLs are at the top level.
##   At worst, a malware scanner.
##   Never fetches robots.txt, contrary to http://www.warebay.com/bot.html.
##   See SolomonoBot/1.02 (http://www.solomono.ru)
User-Agent: SolomonoBot
Disallow: /

## Yet another claimed search engine that generates bad links from plain text.
##   It fetches and then ignores robots.txt
##   188.138.48.235 http://www.warebay.com/bot.html
User-Agent: WBSearchBot
Disallow: /

## Ignores robots.txt
User-Agent: Sosospider
Disallow: /

## Does not handle protocol relative links.  It does not fetch robots.txt.
User-Agent: 360Spider
Disallow: /

## Does not handle protocol relative links.
User-Agent: 80legs
Disallow: /

## Does not know the difference between a hyperlink <A HREF="..."></A> and
##   anchors that are not links such as <A NAME="..."></A>
User-Agent: YamanaLab-Robot
Disallow: /

## Ignores rel="nofollow" in links
##   parses ...href='asdf' onclick='... (single quote ('') instead of double ("")
##   as if " onclick=..." were part of the URL.
##   It fetches robots.txt and then ignores it
User-Agent: Aboundex
Disallow: /
User-Agent: Aboundexbot
Disallow: /

## Fetches robots.txt for only some domains.
##   It searches for non-existent but often abused URLs such as .../contact.cgi
User-Agent: yunyun
Disallow: /

## Multiple long crawls a day... and .ru
User-Agent: MegaIndex.ru
Disallow: /
### End Poorly Implemented Crap Bots ### }}}


### Waste of Bandwidth ### {{{
## Monetizers of other people's bandwidth.
User-Agent: Exabot
Disallow: /
## Monetizers of other people's bandwidth.
User-Agent: findlinks
Disallow: /
## Monetizers of other people's bandwidth.
User-Agent: aiHitBot
Disallow: /
## Monetizer of other people's bandwidth. It ignores robots.txt.
User-Agent: AhrefsBot
Disallow: /

## Yet another monetizer of other people's bandwidth that hits selected
##   pages every few seconds from about a dozen HTTP clients around the
##   world without let, leave, hindrance, or notice.
##   There is no apparent way to ask them to stop.  One DinoPing agent at
##   support@edis.at responded to a request to stop with "just use iptables"
##   on 2012/08/13.
##   They're blind to the irony that one of their targets is
##   <A HREF="that-which-we-dont.html">http://www.rhyolite.com/anti-spam/that-which-we-dont.html</A>
User-Agent: DinoPing
Disallow: /

## Waste of bandwidth
User-Agent: masscan
Disallow: /

## Waste of bandwidth
User-Agent: escan
Disallow: /

## No apparent reason to spend bandwidth or attention on its bad URLs in logs
User-Agent: discoverybot
Disallow: /

## Unasked for tracking. Monetizes
User-agent: Uptimebot
Disallow: /

## Monetizing
User-agent: AhrefsBot
Disallow: /
### End Waste of Bandwidth ### }}}


### Get Off My Lawn ### {{{
## Cutsy story is years stale and no longer excuses bad crawling
User-Agent: dotnetdotcom
Disallow: /

## Cutsy story is years stale and no longer excuses bad crawling
User-Agent: dotbot
Disallow: /

## Unprovoked, unasked for "monitoring" and "checking"
User-Agent: panopta.com
Disallow: /

## No "biomedical, biochemical, drug, health and disease related data" here.
##   192.31.21.179 switch from www.integromedb.org/Crawler to "Java/1.6.0_20"
##   and "-" after integromedb was added to robots.txt
User-Agent: www.integromedb.org/Crawler
Disallow: /

## Ambulence chasers with stupid spider that hits the bad spider trap.
User-Agent: ip-web-crawler.com
Disallow: /

## Little public information
User-Agent: Findxbot
Disallow: /

## Don't know why it crawled me
User-Agent: ips-agent
Disallow: /

## Don't know why it crawled me
User-Agent:Go-http-client
Disallow: /
### End Get Off My Lawn ### }}}


### Plain Attack ### {{{
## evil
User-Agent: ZmEu
Disallow: /

## evil
User-Agent: Morfeus
Disallow: /

## evil
User-Agent: Snoopy
Disallow: /
### End Plain Attacks ### }}}


User-agent: bot-pge.chlooe.com
Disallow: /

## Firewall anything that goes to trap
User-agent: *
Allow: /
Disallow: /badbottrap
Disallow: /.well-known
